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REMARKS 

Claims 1-21 are pending. Claims 1, 3-7, 9-14. 16-17, and 19-21 stand rejected 
under 35 U.S.C. § 102(e) as being anticipated by U.S. Patent No. 6,523,026 to Gillis. 
Claims 2, 8, 15, and 18 stand rejected under 35 U.S.C. § 103(a) as being unpatentable 
over U.S. Patent No. 6,523,026 to Gillis in view of U.S. Patent No. 5,799,276 to 
FComissarchik et al. 

Reconsideration is requested. No new matter is added. The specification is 
amended. Claims 1,6-7, 12-13, and 18-21 are amended. Claims 2, 8, and 15 are 
canceled. The rejections are traversed. Claims 1, 3-7, 9-14, and 16-21 remain in the 
case for consideration. 

REJECTION OF CLAIMS UNDER 35 U.S.C. §§ 102(e) and 103(a) 

Referring to claim I, the invention is directed toward a computer-implemented 
method for constructing a single vector representing a semantic abstract in a 
topological vector space for a semantic content of a document on a computer system, 
the method comprising: storing a semantic content for the document in computer 
memory accessible by the computer system; identifying a directed set of concepts as a 
dictionary, the directed set including a maximal element at least one concept, and at 
least one chain from the maximal element to every concept; selecting a subset of the 
chains to form a basis for the dictionary; identifying lexemes/lexeme phrases in the 
semantic content; measuring how concretely each lexemes/lexeme phrase is 
represented in each chain in the basis and the dictionary; constructing state vectors in 
the topological vector space for the semantic content using the measures of how 
concretely each lexemes/lexeme phrase is represented in each chain in the dictionary 
and the basis; superpositioning the state vectors to construct the single vector; and 
storing the single vector as the semantic abstract for the document 

Referring to claim 7, the invention is directed toward a computer-readable 
medium containing a program to construct a single vector representing a semantic 
abstract in a topological vector space for a semantic content of a document on a 
computer system, the program comprising: storing software to store a semantic 
content for the document in computer memory accessible by the computer system; 
identification software to identify a directed set of concepts as a dictionary, the 
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directed set including a maximal element at least one concept, and at least one chain 
from the maximal element to every concept; selection software to select a subset of 
the chains to form a basis for the dictionary; identification software to identify 
lexemes/lexeme phrases in the semantic content; measurement software to measure 
how concretely each lexemes/lexeme phrase is represented in each chain in the basis 
and the dictionary; construction software to construct state vectors in the topological 
vector space for the semantic content using the measures of how concretely each 
lexemes/lexeme phrase is represented in each chain in the dictionary and the basis; 
superpositioning software to superposition the state vectors to construct the single 
vector; and storing software to store the single vector as the semantic abstract for the 
document. 

Referring to claim 13, the invention is directed toward an apparatus on a 
computer system to construct a single vector representing a semantic abstract in a 
topological vector space for a semantic content of a document on a computer system, 
the apparatus comprising: a semantic content stored in a memory of the computer 
system; a lexeme identifier adapted to identify lexemes/lexeme phrases in the 
semantic content; a state vector constructor for constructing state vectors in the 
topological vector space for each lexeme/lexeme phrase identified by the lexeme 
identifier, the state vectors measuring how concretely each lexeme/lexeme phrase 
identified by the lexeme identifier is represented in each chain in a basis and a 
dictionary, the dictionary including a directed set of concepts including a maximal 
element and at least one chain from the maximal element to every concept in the 
directed set, the basis including a subset of chains in the directed set; and a 
superpositioning unit adapted to superposition the state vectors into a single vector as 
the semantic abstract. 

Referring to claim 1 9, the invention is directed toward a computer- 
implemented method for constructing minimal vectors representing a semantic 
abstract in a topological vector space for a semantic content of a document on a 
computer system, the method comprising: storing a semantic content for the document 
in computer memory accessible by the computer system; identifying a directed set of 
concepts as a dictionary, the directed set including a maximal element at least one 
concept, and at least one chain from the maximal element to every concept; selecting 
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a subset of the chains to form a basis for the dictionary; identifying lexemes/lexeme 
phrases in the semantic content; measuring how concretely each lexemes/lexeme 
phrase is represented in each chain in the basis and the dictionary; constructing state 
vectors in the topological vector space for the semantic content using the measures of 
how concretely each lexemes/lexeme phrase is represented in each chain in the 
dictionary and the basis; locating clumps of state vectors in the topological vector - 
space; superpositioning the state vectors within each clump to form a single vector 
representing the clump; collecting the single vectors representing each clump to form 
the minimal vectors; and storing the minimal vectors as the semantic abstract for the 
document. 

Referring to claim 20, the invention is directed toward a computer-readable 
medium containing a program to construct minimal vectors representing a semantic 
abstract in a topological vector space for a semantic content of a document on a 
computer system, the program comprising: storing software to store a semantic 
content for the document in computer memory accessible by the computer system; 
identification software to identify a directed set of concepts as a dictionary, the 
directed set including a maximal element at least one concept, and at least one chain 
from the maximal element to every concept; section software to select a subset of the 
chains to form a basis for the dictionary; identification software to identify 
lexemes/lexeme phrases in the semantic content; measurement software to measure 
how concretely each lexemes/lexeme phrase is represented in each chain in the basis 
and the dictionary; construction software to construct state vectors in the topological 
vector space for the semantic content using the measures of how concretely each 
lexemes/lexeme phrase is represented in each chain in the dictionary and the basis; 
clump location software to locate clumps of state vectors in the topological vector 
space; superpositioning software to superposition the state vectors within each clump 
to form a single vector representing the clump; collection software to collect the 
single vectors representing each clump to form the minimal vectors; and storing 
software to store the minimal vectors as the semantic abstract for the document. 

Referring to claim 21, the invention is directed toward an apparatus on a 
computer system to construct minimal vectors representing a semantic abstract in a 
topological vector space for a semantic content of a document on a computer system, 
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the apparatus comprising: a semantic content stored in a memory of the computer 
system; a state vector constructor for constructing state vectors in the topological 
vector space for each lexeme/lexeme phrase in the semantic content the state vectors 
measuring how concretely each lexeme/lexeme phrase is represented in each chain in 
a basis and a dictionary, the dictionary including a directed set of concepts including a 
maximal element and at least one chain from the maximal element to every concept in 
the directed set, the basis including a subset of chains in the directed set; a clump 
locator unit adapted to locate clumps of state vectors in the topological vector space; a 
superpositioning unit adapted to superposition the state vectors within each clump 
into a single vector representing the clump; and a collection unit adapted to collect the 
single vectors representing the clump into the minimal vectors of the semantic 
abstract. 

In contrast, Gillis teaches a system and method for retrieving semantically 
distant analogies. Gillis begins by initializing vectors for each term. The vectors are 
initialized with random values for each component That way, dot products of pairs 
of vectors are likely to be close to zero, approximating no relationship between the 
terms. Then the system works through documents, learning about the terms. The 
system applies learning laws that correlate nearby words in documents, changing 
vectors to account for word proximity. The process is repeated until the vectors are 
stable, at which point they represent the semantics of the words. The system can then 
construct summary vectors by adding up and normalizing the sum of all vectors for 
which terms in the document can be found. The summary vector can then be used to 
locate semantically distant analogies. 

It is clear from the above description that Gillis constructs its vectors in a very 
specific manner, by iteratively scanning a document and applying learning laws. The 
vectors are initialized with random values, and are modified using the learning laws. 
In contrast, the vectors in the claimed invention are constructed using basis chains in a 
directed set, and measuring how concretely a given term is represented in the basis 
chains. Support for these features can be found in U.S, Patent Application Serial No. 
09/512,963, titled "CONSTRUCTION, MANIPULATION, AND COMPARISON 
OF A MULTI-DIMENSIONAL SEMANTIC SPACE", filed February 25, 2004, This 
patent application is a continuation-in-part of, and incorporates by reference, U.S. 
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Patent Application Serial No. 09/615,726, titled "METHOD AND MECHANISM 
FOR THE CREATION, MAINTENANCE, AND COMPARISON OF SEMANTIC 
ABSTRACTS", filed July 13, 2000, which is a continuation-in-part of, and 
incorporates by reference, U.S. Patent Application Serial No, 09/512,963, titled 
"CONSTRUCTION, MANIPULATION, AND COMPARISON OF A MULTI- 
DIMENSIONAL SEMANTIC SPACE", filed February 25, 2004. (U.S. Patent 
Application Serial No. 09/61 5,726 was amended in a Response to Office Action filed 
April 16, 2004 to include the above-mentioned claim of priority; the priority claim 
was not in the application as originally filed.) The Examiner is referred to ancestor 
U.S. Patent Application Serial No. 09/512,963 for more information, specifically with 
respect to pages 11-18, wherein the concepts of chains, bases, and how concepts can 
be measured relative to basis chains are all discussed; copies of these pages and FIGs. 
4-5G are attached for the Examiner's reference. 

Because neither Gillis nor Komissarchik teach or suggest vector construction 
according to the features claimed, claims 1, 3-7, 9-14, and ] 6-21 are patentable under 
35 U.S.C. § 102(e) over Gillis and under 35 U.S.C. § 103(a) over Gillis in view of 
Komissarchik. Accordingly, claims 1, 3-7, 9-14, and 16-21 are allowable. 

For the foregoing reasons, reconsideration and allowance of claims 1,3-7, 9- 
14, and 16-21 of the application as amended is solicited. The Examiner is encouraged 
to telephone the undersigned at (503) 222-3613 if it appears that an interview would 
be helpful in advancing the case. 



Respectfully submitted, 

MARGER JOHNSON & McCOLLOM, P.C. 




Ariel S, Rogson 
Reg. No. 43,054 



MARGER JOHNSON & McCOLLOM, P.C. 

1030 SW Morrison Street 

Portland, OR 97205 

503*222-3613 

Customer No. 20575 




I hereby certify that this correspondence 
is being transmitted to the U.S. Patent and 
Trademark Office via facsimile number 



Jeanne Bower 
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An Example Topology 

Consider an actual topology on the set P of predicates. This is accomplished by 
exploiting the notion of hyponymy and meaning postulates. 

Let P be the set of predicates, and let B be the set of all elements of 2 1 ' , i.e., 
5 p that express hyponymy. B is a basis, if not of 2 P , i.e., p(P), then at least of 

everything worth talking about: S - u (b: b € B). If b a , by e B, neither containing the other> 
have a non-empty intersection that is not already an explicit hyponym, extend the basis B 
with the meaning postulate b a o by* For example, "dog" is contained in both "carnivore*' and 
"mammal." So, even though the core lexicon may not include an entry equivalent to 

10 "carnivorous mammal," it is a worthy meaning postulate, and the lexicon can be extended to 
include the intersection. Thus, B is a basis for S. 

Because hyponymy is based on nested subsets, there is a hint of partial ordering on S. 
A partial order would be a big step towards establishing a metric. 

At this point, a concrete example of a (very restricted) lexicon is in order,/ FIG. 3 

1 5 shows a set of concepts, including "thing" 305, "man" 3 1 0, "girl" 3 1 2, "adult human" 315, 
"kinetic energy" 320, and "local action" 325 . "Thing" 305 is the maximal element of the set, 
as every other concept is a type of "thing." Some concepts, such as "man" 310 and "girl" 312 
are "leaf concepts," in the sense that no other concept in the set is a type of "man" or "girl." 
Other concepts, such as "adult human" 315, "kinetic energy" 320, and "local action" 325 are 

20 "internal concepts," in the sense that they are types of other concepts (e.g., "local action" 325 
is a type of "kinetic energy" 320) but there are other concepts that are types of these concepts 
(e.g., "man" 3 1 0 is a type of "adult human" 315). 

FIG. 4 shows a directed set constructed from the concepts of FIG. 3. For each 
concept in the directed set, there is at least one chain extending from maximal element 

25 "thing" 305 to the concept. These chains are composed of directed links, such as links 405, 
410, and 415, between pairs of concepts. In the directed set of FIG. 4, every chain from 
maximal element "thing" must pass through either "energy" 420 or "category" 425. Further, 
there can be more than one chain extending from maximal element "thing" 305 to any 
concept For example, there are four chains extending from 'Hhing" 305 to "adult human" 
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315: two go along link 410 extending out of "being" 435, and two go along link 415 
extending out of "adult" 445 . 

Some observations about the nature of FIG. 4: 

♦ First, the model is a topological space. 

5 • Second, note that the model is not a tree. In fact, it is an example of a directed 

set. For example, concepts "being" 430 and "adult human" 315 are types of 
multiple concepts higher in the hierarchy. "Being" 430 is a type of "matter" 435 
and a type of "behavior" 440; "adult human" 3 1 5 is a type of "adult" 445 and a 
type of "human 4 ' 450. 

10 • Third, observe that the relationships expressed by the links are indeed relations of 

hyponymy. 

♦ Fourth, note particularly - but without any loss of generality - that "man" 310 
maps to both "energy" 420 and "category** 425 (via composite mappings) which 
in turn both map to "thing" 305; i.e., the (composite) relations are multiple valued 

15 and induce a partial ordering. These multiple mappings are natural to the meaning 

of things and critical to semantic characterization. 

♦ Finally, note that "thing" 305 is maximal; indeed, "thing" 305 is the greatest 
element of any quantization of the lexical semantic field (subject to the premises 
of the model). 

20 

MetrizingS * •' 

FIGs, 5 A-5G show eight different chains in the directed set that form a basis for the 
directed set FIG. 5 A shows chain 505, which extends to concept "man" 310 through concept 
"energy" 420. FIG. 5B shows chain 510 extending to concept "iguana." FIG. 5C shows 
25 another chain 5 1 5 extending to concept "man" 310 via a different path. FIGs. 5D-5G show 
other chains. 

FIG. 13 shows a data structure for storing the directed set of FIG. 3, the chains of 
FIG. 4, and the basis chains of FIGs. 5A-5G. In FIG. 13, concepts array 1305 is used to store 
the concepts in the directed set. Concepts array 1 305 stores pairs of elements. One element 
30 identifies concepts by name; the other element stores numerical identifiers 1306. For 
example, concept name 1307 stores the concept "dust," which is paired with numerical 
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identifier **2" 1308. Concepts array 1305 shows 9 pairs of elements, but there is no 
theoretical limit to the number of concepts in concepts array 1305. In concepts array 1305, 
there should be no duplicated numerical identifiers 1306. In FIG. 13, concepts anay 1305 is 
shown sorted by numerical identifier 1306, although this is not required. When concepts 

5 array 1305 is sorted by numerical identifier 1306, numerical identifier 1306 can be called the 
index of the concept name. 

Maximal element (ME) 1310 stores the index to the maximal element in the directed 
set In FIG. 13, the concept index to maximal element 1310 is "6," which corresponds to 
concept 'thing," the maximal element of the directed set of FIG. 4. 

10 Chains array 1315 is used to store the chains of the directed set. Chains array 1315 

stores pairs of elements. One element identifies the concepts in a chain by index; the other - 
element stores a numerical identifier. For example, chain 1317 stores a chain of Concept 
indices "6", "5", st 9", "T\ and "2," and is indexed by chain index "1" (1318). (Concept index 
0, which does not occur in concepts array 1305, can be used in chains array 1315 to indicate 

1 5 the end of the chain. Additionally, although chain 1317 includes five concepts, the number of 
concepts in each chain can vary.) Using the indices of concepts array 1305, this chain 
corresponds to concepts "thing," "energy," "potential energy," "matter," and "dust." Chains 
array 1315 shows one complete chain and part of a second chain, but there is no theoretical 
limit to the number of chains stored in chain array 13 15. Observe that, because maximal 

20 element 1310 stores the concept index "6," every chain in chains array 1315 should begin 
with concept index "6." Ordering the concepts within a chain is ultimately helpful in : 
measuring distances between the concepts. However concept order is not required. Further, 
there is no required order to the chains as they are stored in chains array 1315. 

Basis chains array 1320 is used to store the chains of chains array 1315 that form a 

25 basis of the directed set. Basis chains array 1320 stores chain indices into chains array 1315. 
Basis chains array 1320 shows four chains in the basis (chains 1, 4, 8, and 5), but there is no 
theoretical limit to the number of chains in the basis for the directed set- 
Euclidean distance matrix 1325 A stores the distances between pairs of concepts in the 
directed set of FIG. 4. (How distance is measured between pairs of concepts in the directed 

30 set is discussed below. But in short, the concepts in the directed set are mapped to state 

vectors in multi-dimensional space, where a state vector is a directed line segment starting at 
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the origin of the multi-dimensional space and extending to a point in the raulti-dimensional 
space.) The distance between the end points of pairs of state vectors representing concepts is 
measured. The smaller the distance is between the state vectors representing the concepts, 
the more closely related the concepts are. Euclidean distance matrix 132SA uses the indices 
5 1 306 of the concepts array for the row and column indices of the matrix. For a given pair of 
row and column indices into Euclidean distance matrix 1 325 A, the entry at the intersection of 
that row and column in Euclidean distance matrix 1325 A shows the distance between the 
concepts with the row and column concept indices, respectively. So, for example, the 
distance between concepts "man" and "dust" can be found at the intersection of row 1 and 

10 column 2 of Euclidean distance matrix 1325 A as approximately 1.96 units. The distance 
between concepts "man" and "iguana" is approximately 1 .67, which suggests that "man" is 
closer to "iguana" than "man" is to "dust." Observe that Euclidean distance matrix 1325 A is 
symmetrical:. that is, for an entry in Euclidean distance matrix 1 325 A with given row and 
column indices, the row and column indices can be swapped, and Euclidean distance matrix 

15 1 32 5 A will yield the same value. In words, this means that the distance between two 
concepts is not dependent on concept order: the distance from concept "man" to concept 
"dust" is the same as the distance from concept "dust" to concept "man." 

Angle subtended matrix 1 325B is an alternative way to store the distance between 
pairs of concepts. Instead of measuring the distance between the state vectors representing 

20 the concepts (see below), the angle between the state vectors representing the concepts is 
measured. This angle will vary between 0 and 90 degrees. The narrower the angle is 
between the state vectors representing the concepts, the more closely related the concepts are. 
As with Euclidean distance matrix 1325A, angle subtended matrix 1325B uses the indices 
1 306 of the concepts array for the row and column indices of the matrix. For a given pair of 

25 row and column indices into angle subtended matrix 1325B, the entry at the intersection of 
that row and column in angle subtended matrix 1 325B shows the angle subtended the state 
vectors for the concepts with the row and column concept indices, respectively. For example, 
the angle between concepts "man" and "dust" is approximately 51 degrees, whereas the angle 
between concepts "man" and "iguana" is approximately 42 degrees. This suggests that 

30 "man" is closer to "iguana" than "man" is to "dust" As with Euclidean distance matrix 
1325 A, angle subtended matrix 1325B is symmetrical. 
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Not shown in FIG. 13 is a data structure component for storing state vectors 
(discussed below). As state vectors are used in calculating the distances between pairs of 
concepts, if the directed set is static (i.e., concepts are not being added or removed and basis 
chains remain unchanged), the state vectors are not required after distances are calculated. 
5 Retaining the state vectors is useful, however, when the directed set is dynamic. A person 
skilled in the art will recognize how to add state vectors to the data structure of FIG. 13. 

Although the data structure for concepts array 1305, maximal element 1310 chains 
array 1315, and basis chains array 1320 in FIG. 13 are shown as arrays, a person skilled in 
the art will recognize that other data structures are possible. For example, concepts array 

10 could store the concepts in a linked list, maximal element 1310 could use a pointer to point to 

the maximal element in concepts array 1305, chains array 1315 could use pointers to point to 

the elements in concepts array, and basis chains array 1320 could use pointers to point to 
chains in chains array 1315. Also, a person skilled in the art will recognize that the data in 
Euclidean distance matrix 1325 A and angle subtended matrix 1325B can be stored using 

1 5 other data structures. For example, a symmetric matrix can be represented using only one 
half the space of a full matrix if only the entries below the main diagonal are preserved and 
the row index is always larger than the column index. Further space can be saved by 
computing the values of Euclidean distance matrix 1325 A and angle subtended matrix 1325B 
"on the fly" as distances and angles are needed. 

20 Returning to FIGs. 5 A-5G, how are distances and angles subtended measured? The 

chains shown in FIGs. 5 A-5G suggest that the relation between any node of the model and 
the maximal element '*thing" 305 can be expressed as any one of a set of composite functions; 
one function for each chain from the minimal node \x to "thing" 305 (the n* predecessor of \x 
along the chain): 

25 f: n => thing ~fi °f 2 °fs ° •» °fn 

where the chain connects n + 1 concepts, and fy. links the {n -jf 1 predecessor of p with the 
(n + 1 -y)* predecessor of (A, 1 <j < n. For example, with reference to FIG. SA, chain 505 
connects nine concepts. For chain 505,// is link 505A,yi is link 505B, and so on through 
being link 505H. 

30 Consider the set of all such functions for all minimal nodes. Choose a countable 

subset {f k } of functions from the set. For each f k construct a function g k : S 1 1 as follows. 
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For s e S, s is in relation (under hyponymy) to "thing" 305. Therefore, s is in relation to at 
least one predecessor of n, the minimal element of the (unique) chain associated with fie 
Then there is a predecessor of smallest index (of n), say the m*, that is in relation to s. 
Define: 

5 SkCO - (p - m) / n Equation (2) 

This formula gives a measure of concreteness of a concept to a given chain associated with 
function fk- 

As an example of the definition of gk, consider chain 505 of FIG. 5 A, for which n is 8. 

10 Consider the concept "cat" 555. The smallest predecessor of "man" 3 10 that is in relation to 
"cat" 555 is "being" 430. Since "being" 430 is the fourth predecessor of "man" 3 10, m is 4, 
and gkCcat" 555) = (8 - 4) / 8 = V 2 . "Iguana" 560 and "plant" 560 similarly have g k values of 
Vi. But the only predecessor of "man" 310 that is in relation to "adult" 445 is "thing" 305 
(which is the eighth predecessor of "man" 3 1 0), so m is 8, and ^("adult" 445) = 0. 

15 Finally, define the vector valued function <p: S R k relative to the indexed set of 

scalar functions {g u gi> g3, gk} (where scalar functions {gi, ga, gj, .... gk) are defined 
according to Equation (2)) as follows: 

q*s) = <g 1 (*).B200.g 3 (*) ■*)>■ Equation (3) 

20 This state vector <p(s) maps a concept s in the directed set to a point in k-space (lR k ). One can 
measure distances between the points (the state vectors) in k-space. These distances provide 
measures of the closeness of concepts within the directed set. The means by which distance 
can be measured include distance functions, such as Equations (la), (lb), or (lc). Further, 
trigonometry dictates that the distance between two vectors is related to the angle subtended 

25 between the two vectors, so means that measure the angle between the state vectors also 
approximates the distance between the state vectors. Finally, since only the direction (and 
not the magnitude) of the state vectors is important, the state vectors can be normalized to the 
unit sphere. If the state vectors are normalized, then the angle between two state vectors is no 
longer an approximation of the distance between the two state vectors, but rather is an exact 

30 measure. 
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The functions are analogous to step functions, and in the limit (of refinements of 
the topology) the functions are continuous. Continuous functions preserve local topology; 

i.e., "close things" in S map to "close things" in R k , and "far things" in S tend to map to "far 
things" in (R k . 

5 

Example Results 

The following example results show state vectors <p(s) using chain 505 as function gi, 
chain 51 0 as function g2, and so on through chain 540 as function gs- 
<p("boy") <3/4, 5/7, 4/5, 3/4, 7/9, 5/6, 1, 6/7> 

10 <p("dust") => <3/8 t 3/7, 3/10,1, 1/9, 0, 0, 0> 

<p("iguana") => <l/2, 1, 1/2, 3/4, 5/9, 0, 0, 0> 
9( ,t woman'0=> <7/8, 5/7, 9/10,3/4, 8/9, 2/3, 5/7, 5/7> 
cpCman") » <l, 5/7, 1, 3/4, 1, 1, 5/7, 5/7> 
Using these state vectors, the distances between concepts and the angles subtended 

15 between the state vectors are as follows: 



Pairs of Concepts 


Distance 
(Euclidean) 


Angle 
Subtended 


"boy" and "dust" 


-1.85 


-52° 


"boy" and "iguana" 


-1.65 


-46° 


"boy" and "woman" 


-0.41 


-10° 


"dust" and "iguana" 


-0.80 


-30° 


"dust" and "woman" - 


-1.68 


-48* 


"iguana" and "woman" 


-1.40 


-39° 


"man" and "woman" 


-0.39 


-07° 



From these results, the following comparisons can be seen: 

• "boy" is closer to "iguana" than to "dust." 

• "boy" is closer to "iguana" than "woman" is to "dust." 

20 • "boy" is much closer to "woman" than to "iguana" or "dust." 

• "dust" is further from "iguana" than "boy" to "woman" or "man" to l4 woman." 

• "woman" is closer to "iguana" than to "dust." 
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• "woman" is closer to "iguana" than "boy" is to "dust" 

• "man" is closer to "woman" than "boy" is to "woman." 

All other tests done to date yield similar results. The technique works consistently 



well. 



How It (Really) Works 

As described above, construction of the q> transform is (very nearly) an algorithm. In 
effect, this describes a recipe for metrizing a lexicon - or for that matter, metrizing anything 
that can be modeled as a directed set - but does not address the issue of why it works. In 
10 other words, what 's really going on here? To answer this question, one must look to the 
underlying mathematical principles. 

First of all, what is the nature of S? Earlier, it was suggested that a propositional 
model of the lexicon has found favor with many linguists. For example, the lexical element 

"automobile" might be modeled as: 
15 {automobile: is a machine, 

is a vehicle, 
has engine, 
has brakes, 

20 > 

In principle, there might be infinitely many such properties, though practically 

speaking one might restrict the cardinality to K 0 (countably infinite) in order to ensure that 

the properties are addressable. If one were disposed to do so, one might require that there be 

only finitely many properties associated with a lexical element. However, there is no 

25 compelling reason to require finiteness. 

At any rate, one can see that "automobile" is simply an element of the power set of P. 
the set of all propositions; i.e., it is an element of the set of all subsets of P. The power set is 
denoted as *>(P). Note that the first two properties of the "automobile" example express "te 
a" relationships. By "* a" is meant entailment. Entailment means that, were one to intersect 

30 the properties of every element of & (P) that is called, for example, "machine," then the 
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FIG. 4 
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FIG. 5A 
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FIG. 5B 
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FIG. 5C 
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FIG. 5D 
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FIG. 5E 
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FIG. 5F 
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FIG. 5G 
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